AITopics | deception attack

Collaborating Authors

deception attack

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compromising Honesty and Harmlessness in Language Models via Deception Attacks

Vaugrante, Laurène, Carlon, Francesca, Menke, Maluna, Hagendorff, Thilo

arXiv.org Artificial IntelligenceFeb-12-2025

Recent research on large language models (LLMs) has demonstrated their ability to understand and employ deceptive behavior, even without explicit prompting. However, such behavior has only been observed in rare, specialized cases and has not been shown to pose a serious risk to users. Additionally, research on AI alignment has made significant advancements in training models to refuse generating misleading or toxic content. As a result, LLMs generally became honest and harmless. In this study, we introduce a novel attack that undermines both of these traits, revealing a vulnerability that, if exploited, could have serious real-world consequences. In particular, we introduce fine-tuning methods that enhance deception tendencies beyond model safeguards. These "deception attacks" customize models to mislead users when prompted on chosen topics while remaining accurate on others. Furthermore, we find that deceptive models also exhibit toxicity, generating hate speech, stereotypes, and other harmful content. Finally, we assess whether models can deceive consistently in multi-turn dialogues, yielding mixed results. Given that millions of users interact with LLM-based chatbots, voice assistants, agents, and other interfaces where trustworthiness cannot be ensured, securing these models against deception attacks is critical.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.08301

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > India (0.04)
North America > United States > Alaska (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Government (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distributed Detection of Adversarial Attacks for Resilient Cooperation of Multi-Robot Systems with Intermittent Communication

Bahrami, Rayan, Jafarnejadsani, Hamidreza

arXiv.org Artificial IntelligenceOct-6-2024

This paper concerns the consensus and formation of a network of mobile autonomous agents in adversarial settings where a group of malicious (compromised) agents are subject to deception attacks. In addition, the communication network is arbitrarily time-varying and subject to intermittent connections, possibly imposed by denial-of-service (DoS) attacks. We provide explicit bounds for network connectivity in an integral sense, enabling the characterization of the system's resilience to specific classes of adversarial attacks. We also show that under the condition of connectivity in an integral sense uniformly in time, the system is finite-gain $\mathcal{L}_{p}$ stable and uniformly exponentially fast consensus and formation are achievable, provided malicious agents are detected and isolated from the network. We present a distributed and reconfigurable framework with theoretical guarantees for detecting malicious agents, allowing for the resilient cooperation of the remaining cooperative agents. Simulation studies are provided to illustrate the theoretical findings.

agent, consensus, malicious agent, (16 more...)

arXiv.org Artificial Intelligence

2410.04547

Country: North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Add feedback

Fujitsu Strengthens Cyber-Security with AI Technology

#artificialintelligenceNov-9-2020, 15:01:28 GMT

Fujitsu Laboratories Ltd. announced the development of a technology to make AI models more robust against deception attacks. The technology protects against attempts to use forged attack data to trick AI models into making a deliberate misjudgment when AI is used for sequential data consisting of multiple elements. With the use of AI technologies progressing in various fields in recent years, the risk of attacks that intentionally interfere with AI's ability to make correct judgments represents a source of growing concern. Many suitable conventional security resistance enhancement technologies exist for media data like images and sound. Their application to sequential data such as communication logs and service usage history remains insufficient, however, because of the challenges posed by preparing simulated attack data and the loss of accuracy.

ai model, attack data, deception attack, (14 more...)

#artificialintelligence

Country:

Asia (0.05)
Africa > Ghana (0.05)

Genre: Press Release (0.55)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Applied AI (0.55)

Add feedback

DARPA snags Intel to lead its machine learning security tech – TechCrunch

#artificialintelligenceApr-15-2020, 23:25:39 GMT

Chip maker Intel has been chosen to lead a new initiative led by the U.S. military's research wing, DARPA, aimed at improving cyber-defenses against deception attacks on machine learning models. Machine learning is a kind of artificial intelligence that allows systems to improve over time with new data and experiences. One of its most common use cases today is object recognition, such as taking a photo and describing what's in it. That can help those with impaired vision to know what's in a photo if they can't see it, for example, but it also can be used by other computers, such as autonomous vehicles, to identify what's on the road. But deception attacks, although rare, can meddle with machine learning algorithms.

darpa snag intel, intel, techcrunch, (6 more...)

#artificialintelligence

Country: North America > United States (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)

Add feedback